Reverse Engineering of Proteasomal Translocation Rates 
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We address the problem of proteasomal protein translocation and introduce a new stochastic 
model of the proteasomal digestion (cleavage) of proteins. In this model we account for the protein 
translocation and the positioning of cleavage sites of a proteasome from first principles. We show 
by test examples and by processing experimental data that our model allows reconstruction of the 
translocation and cleavage rates from mass spectroscopy data on digestion patterns and can be used 
to investigate the properties of transport in different experimental set-ups. Detailed investigation 
with this model will enable theoretical quantitative prediction of the proteasomal activity. 
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A macromolecular complex, the proteasome, is the cen- 
tral molecular machine for the degradation of intracel- 
lular proteins pj. Proteasomes have a pivotal role in 
antigen processing that prepares epitopes for an immune 
system [2]. They exist in cells as the free proteolytically 
active core, the barrel-shaped 20S proteasome (Fig. [lb), 
and as associations of this core with regulatory complexes 
(PA700 or PA28) at its ends [Sj. Here we consider in vitro 
proteasomal digestion assays widely used in molecular bi- 
ology and immunology to investigate proteasomes. 

A protein (Fig.[T]) enters the proteasome and is translo- 
cated into the central chamber where it is cleaved into 
fragments by the cleavage sites. We assume that 6 cleav- 
age sites are arranged along two rings (Fig. [2]). Frag- 
ments of the protein produced are removed through pro- 
teasome gates. The translocation proteasomal function 
can qualitatively change the expression of the specific 
fragment, e.g., an epitope, because modified transloca- 
tion and thus increased time of residence near the cleav- 
age terminal changes the conditions of cleavage. More- 
over, impairment of proteasomal degradation, probably 
due to translocation malfunction, might contribute to the 
pathology of various neurodegenerative conditions [J] . 

The mechanism of protein translocation remains un- 
known. It is also unknown whether translocation proper- 
ties are different for different proteasome types (constitu- 
tive or immuno-), with/without different regulatory com- 
plexes, and with different experiment conditions (con- 
centration ratios, temperature, etc.). Only a few papers 
address the translocation problem but these are either 
based on semi-phenomenological descriptions of uptak- 
ing and translocation of the protein [^, [y, 0] or they sug- 
gest a transport mechanism hypothesis not yet verified 
experimentaly Isll. On the other hand, there exist sev- 
eral facilities [9|, |ll| to predict where the protein will be 



cleaved but as numerous experiments show IJ] these al- 
gorithms do not always work reliably. The reason is that 
these algorithms utilize experimental data resulted even- 



tually from some specific protein sequence and transloca- 
tion function, but the prediction is made based only on 
the sequence, ignoring the translocation function. In con- 
trast to these approaches, here we introduce a stochastic 
model which allows one to reconstruct both the translo- 
cation and cleavage rates from mass spectroscopy (MS) 
data on digestion patterns. Collecting the reconstructed 
features of a specific proteasome type can be used for a 
reliable prediction of the fragment expression. 

In our model of protein translocation and degradation 
by the proteasome we assume that: 

(1) The event rate of the protein shift by one amino acid 
(aa) into the proteasome (to the right in Fig.[2|) depends 
only on the length x of the protein forward end beyond 
the cleavage sites nearest to the proteasome chamber 
entrance used for protein infiltration (the left ones in 
Fig. ID); this event rate is given by the translocation rate 
function (TRF) v{x) = Vx. The backward motions of the 
entering strand are neglected. These assumptions do not 
impose significant restrictions on the physical mechanism 
of the translocation process: they are valid for the Brow- 
nian drift in a tilted spatially-periodic potential ^Td\ as 
well as for the ratchet effect j^j , etc. The TRFs of differ- 
ent proteasome species (20S, 26S, ±PA28 3]) may differ. 

(2) When the protein strand is close to the cleavage site, 
the event rate of the cleavage depends on the sequence of 
aa nearest to the peptide bond cleaved [nj . For the given 
protein, this conditional cleavage rate (CCR), 7(t) = 7^, 
is a function of the bond number r (Fig.[T]); later on we 
use T near the first ring of cleavage sites as a time-like 
variable. 

(3) The peptides (cleaved parts of the protein degraded) 
leave the chamber through proteasome gates. Due to 
their mobility being higher in comparison to that of 
the protein, processed peptides leave the chamber quick 
enough to neglect both their possible further splitting 
and their influence on the protein transport. 

Let us now introduce the distribution w{x\t) which is 
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initial M-mer protein 



peptide bonds. amino acids entering end-, 

: Pm )ffM^...-(K^){P^ P2 H Pl ; 



(n,m) peptide = m-nisubsequencei 

T=M-1 T=n T=m-1 T=2 X=1 

FIG. 1: Peptide bonds are indexed with r, aa with Pi 



the probability of the protein forward end beyond the 
first ring of the cleavage sites to be of the length x, when 
the rth bond is near that ring, in our terms, at the dis- 
crete "time moment" r. We measure x in aa, thus x and 
T are integer. To describe the "temporal" evolution of 
distribution w{x\t), we consider the shift of the protein 
strand into the proteasome for one aa, i.e., the transition 
r — > r + 1. Let us decompose w{x\t + 1) as 

w{x\t + 1) = J2j Wj(a;|T + 1) , 

where Wj{x\T + 1) are the contributions due to different 
scenarios of this transition. Along with w{x\t), we ac- 
count Q{n, m|r), the amount of the peptide (n, m), which 
is the m-n subsequence of the degraded protein (Fig. [2]), 
generated during transition t —>■ t + I. 

There are three possible elementary events: 

(a) the protein strand shift: x x + 1, r^r + l; the 
event rate is v^; 

(b) the cleavage on the first ring of cleavage sites (a; = 0) : 
X 0, r ^ r; the event rate is 7r; 

(c) the cleavage on the second ring of cleavage sites {x — 
L, L is the distance between the rings of cleavage sites. 



prote 




FIG. 2: Infiltration of a protein strand into the 20S protea- 
some: The scissors mark the positions of cleavage sites rings 
at a; = and x — L; the cleavage occurs via the attaching- 
detaching of the protein to cleavage sites (dark-grey color); 
thus, in the figure the bonds between Pr+i and Ft and be- 
tween Pt-l+1 and Pt-l may be cleaved. The first aa has 
index t — x + 1 after t — x aa have been cut out (see inset). 



Fig. [2]): X ^ L, T ^ t; the event rate is Jt-l- 

In terms of these elementary events the possible sce- 
narios of transition r ^ r + 1 are 
1) Elementary event (a). Its probability is 

Pi{x\t) = V^/ (Vx + 7r + 6(.X-L-l)7r-L), 

where the Heaviside function Q{x<0) = 0, 0(a;>O) = 1. 
In this scenario, x x + 1, and 



^1(2: + l|r + 1) = Pi{x\t)w{x\t) . 



(1) 



No peptides are generated; 

2) Elementary event (b), which may not be followed by 
anything but the strand shift by one aa (as there is noth- 
ing to be cleaved). This scenario probability is 

P2{x\t) = -fr/{vx+lr + Q{x - L - l)-fr- l) . 

In this scenario, a; — s- 1, and 

W2ix\T + 1) = 4,1 ^2(a;'|r) wix'lr) . (2) 

The peptides cut out are 

Q2{t,t-x + 1\t)^P2{x\t)w{x\t); (3) 

3) Elementary event (c), which may be followed either by 
strand shift (1) or by scenario (2). The probability of 
the first stage (c) is 

Pc{x\t) = <d{x-L-l)^r-L/{vx +7r +1t-l)- 

After event (c), when x — > L, the number of the system 
states generated is 

Wc{x\t) = S.j;^L E^=L+i Pc{x'\t) w(x'\t) , 

and the peptides cut out are 

Qcir- L,t-x + 1\t) ^ Pcix\T) w{x\t) . 

The subsequent events (1) or (2) should be regarded as 
the respective above mentioned scenarios starting with 
distribution Wc{x\t), i.e., 

Wciix\T + l) = Pi{L\t)wc{x - 1\t) 

^Pi{L\t) 5.,,l+iY.7'=l+iPc{^'V)w{x'\t), (4) 

Qci{t-L,t-x + 1\t) ^ Pi{L\t) Q,{t-L,t~x+1\t) 

^Pi[L\T)PMr)w{x\T), (5) 

Wc2{x\t +1)^ 5^,1 P2{x'\t) Wc{x'\t) 

= 4,1 P2{L\t) Y.7'=L+i Pci^'V) w{x'\r) , (6) 
Qc2{t-L,t-x + 1\t) ^ P2{L\t) Q,{t-L,t-x+\\t) 

^P2{L\T)P,{x\r)w{x\T). (7) 

Qc2{t, T - X+1\t) ^ P2{x\t) Wc{x\t) 

= 4,L P2{L\t) Pci^'h) M^'k) ■ (8) 
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Collecting Eqs. ([T]) , ((21) , H]) , dS]) , we find master equation 

L 
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1tw{x\t) 

x=l 
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E 



7r w{x\t) 



x=L+l 
"fr-L w{x\t) 



■It 



It-L 

(9) 



u.(L + l|r+ 1) 
u;(i|T) - 

w(a;|T + 1) = 



V - 

^ V 

x=L+l ■ 



Jr~Lw{x\T) 



+ 7t + 7t 



Vx^i w{x - l|r) 



(10) 



(11) 



Vx-1 + 7t + 0(x-i-l)7r-L 

Af — 1, where 



Here x = 1,2,3,...,M and r = 1,2,3, 
M is the length of the protein (Fig. [2]). 

Performing numerical simulation of degradation of 
Mmer polypeptide, one should start at r = 1 with 
w{x\t = 1) = Sx,i and iterate Eqs. ([9l)- pT|) till the last 
T — M — 1. Additionally, the releasing of the last frag- 
ment from the chamber at the "time moment" t = M 
should be taken into account: Q{M,M — x + ^M) 
Q{M, M-x+l\M)+w{x\M). Hence, with w{x\t) known 
for r = 1,2, ...,M, one can summarize Eqs. ([3]), dS]), ^ 
and ([8]) for to evaluate the digestion pattern, i.e. the total 
amount Q{m^ n) of the peptide (n, m) generated during 
a single polypeptide processing, 

QiTl,T2) = QiTl,T2\Tl)+Q{M-Tl-L)Q{Tl,T2\Tl+L) 
^6r,,AMTl+L-T2 + l\M) 

^ 7ti w{ti -T2 + 1|ti) 

VT1-T2 + 1 + 7ri + 0(n-T-2-i)7Ti-L 
7ri 7ri-L wjxlTi) 



VL + 7ri 



+7ri +7ri-L 



a:=L+l 



e(A/-ri-L) 



7ri w(ti + L - T2 + 1 In + L) 
^^Ti+L-T2 + 1 + 7ri+L + 7ri 



(12) 



here 1 < '''2 < '''i < Since the protein can be cleaved 
starting both from the C- and from the N-terminal, the 
final digestion pattern is given by 



(3fin(Ti, r2) = Pn Qn(ti, r2) 

+Pc Qc{M - T2 + 1, M - n + 1) 



(13) 



The subscripts indicate which terminal goes first, Pn and 
Pc = 1 — Pn are the probabilities of the degradation 
starting from the corresponding end. 

Digestion pattern Qf^■c^{Tl , T2) is a functional of TRF Vx 
and CCR 77-. Utilizing MS data on the digestion pattern, 
one can determine nonzero values of 7t- (i.e. positions of 
possible cleavage) and minimize the mismatch between 



Qfin('''i,T2) and MS data Q{ti,T2) over Vx^ the nonzero 
values of 7,-, and Pn in order to reconstruct them. Note, 
Vx and 7^. are defined up to the constant multiplier, which 
should be determined from the degradation rate in real 
time, but not from the digestion pattern. 

In order to verify the robustness of the reverse engi- 
neering procedure, numerous tests have been performed. 
A typical test presented in Figs.[3K-c has been per- 
formed as follows. For given Vx and 7,- the diges- 
tion pattern Q{ti,T2) has been evaluated. The result 
has been perturbed by noise; Q(tiT2) — Q(tiT2) + 
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FIG. 3: Test (a-c); Reconstruction of translocation function 
Vx and conditional cleavage rates 7^ for the 28mer peptide 
Kloe 320 a) the conditional cleavage probabilities and 

the aa sequence; b) the translocation rate function; c) the 
upper plot presents a set of digestion fragments (black bars: 
fragments utilized for the reconstruction, gray bars: not uti- 
lized), and the lower plot presents the amount of the corre- 
sponding fragment (diamonds: the reconstructed values Qfln, 
gray bars: the values of Q utilized for the reconstruction). 
Experiment (d-f): Reconstruction of and 7^ for the 28mer 
peptide Kloe 258 degraded by 20S proteasome; Pn = 54%. 
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10 '^Rr^ ,T2 \/Q (n , T2 ) , where Rr^ ,-2 are independent ran- 
dom numbers uniformly distributed in [—1,1]. The infor- 
mation about Imer and 2mer fragments and fragments 
which relative amount is less than 5 • 10"'^ has been omit- 
ted as being hardly acquirable in experiments [l^. Re- 
sulting QriT2 has been used for the reconstruction of 
Vx and -fr- The original and reconstructed data for 7^ 
(Fig. [3^) and (Fig.[3)D) are in a good agreement. The 
reconstructed Pn = 0.49 against the original = 0.50. 

Figs.[3ji-f present the results of the reverse engineer- 
ing from the experimental (in vitro) digestion pattern for 
the 28mer Kloe 258, which is the sequence 101-128 aa of 
human Myelin Basic Protein, degraded by 20S protea- 
somes purified from lymphoblastoid cell lines, which ex- 
press mainly the immunoproteasome (for materials and 
methods see [l2j)- The TRF appears to be mono- 
tonically decaying; the reconstructed probability of the 
degradation starting from the N-terminal Pn — 54%, 
meaning the degradations from the N- and C-ends are 
almost equally probable in this case. 

The suggested reconstruction method has some limita- 
tions. The reconstruction procedure for short polypep- 
tides is very sensitive to measurement inaccuracy. 
Though the whole information on Q(ti, T2) is not needed, 
the number of nonzero values of Q(ti,T2) utilized for a 
reliable (tolerant to noise) reconstruction should consid- 
erably exceed the number of reconstructed parameters. 
For Kloe 258 the number of trustworthy and utilized val- 
ues of (5(ti, T2) is 19 (see Fig.[3f), it is a bit greater than 
the number of unknown parameters which is 14. Hence, 
more accurate and comprehensive MS data on the diges- 
tion pattern are needed. Additionally, for short polypep- 
tides the finishing stage of the degradation is relatively 
important, because on this stage the translocation rate 
is affected by the edge effects (the backward end of the 
polypeptide gets inside the proteasome chamber) and is 
not the same as for the remainder of the polypeptide. 

Fast and effective design of new intelligent drugs 
against immune and autoimmune deceases Q is impossi- 
ble without development of the virtual immune system, 
by which these drugs can be tested in silico. One of the 
most important steps on this road is the prediction of 
presentation profile, i.e. number of epitopes, from tran- 
scription to presentation on MHC class I complex and 
potentially recognized by CD8-I- T-cells. To do this, one 
should be able to predict reliably the proteasomal diges- 
tion pattern (DP) that is determined by sequence-specific 
cleavage preferences (SSCP), that is quantified by CCR 
in our approach, and proteasomal TRF. Some attempts 
to fulfill these predictions have been made based on find- 
ing the correlations between SSCP and final DP. These 
algorithms are available on the Internet 0, [lH however 
they are not always reliable because of ignoring the de- 
pendence of DP on TRF. The significance of the mathe- 
matical method presented here is the possibility to find 



dependencies between SSCP, TRF and DP. Using this 
method and applying it to various experimental data one 
would be able to construct algorithms for a reliable pre- 
diction of proteasomal DP. 

In summary, we have proposed a model of the degra- 
dation of proteins by the proteasome which allows one 
to reconstruct the proteasomal transport function and 
cleavage strengthes. The model is applicable to a broad 
variety of hypothetically possible translocation mecha- 
nisms 0, [lO[. We have tested the model for relatively 
short (25-50mers) synthetic polypeptides as the most 
common case for in vitro experiments. Earlier, in [l5| . 
we have described how peculiarities of the translocation 
function may lead to the multimodality of the fragment 
length distribution even for 7(t) = const. Here we have 
shown that the amount of each digestion fragment is not 
only determined by the cleavage map of the substrate but 
is also crucially affected by nonuniformity of the translo- 
cation rate. The proposed methodology can be used in 
extensive analysis of already available MS data for the 
20S proteasomes and its associations with different regu- 
latory complexes and under different experimental condi- 
tions. The results of this analysis, specifically, the shape 
of the translocation rate function and its variations for di- 
verse proteasome species under different conditions, can 
give insight into the mystery of the protein translocation 
mechanism inside the proteasome. Such an analysis can 
elucidate also the unanswered question whether there is 
some preference for starting the degradation with the N- 
or C-terminal of the protein, and how this preference is 
quantitatively affected by regulatory complexes. 

We thank S. Witt for fruitful discussions and the VW- 
Stiftung, PROTEOMAGE (FP6), the BRHE program, 
and "Perm Hydrodynamics" for financial support. 
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